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ABSTRACT 

We present a novel method for the optimal selection of quasars using time- 
series observations in a single photometric bandpass. Utilizing the damped ran- 



X 



dom walk model of Kelly et al. (2009), we parameterize the ensemble quasar 



structure function in Sloan Stripe 82 as a function of observed brightness. The 
ensemble model fit can then be evaluated rigorously for and calibrated with indi- 
vidual light curves with no parameter fitting. This yields a classification in two 
Ph statistics — one describing the fit confidence and one describing the probability 

of a false alarm — which can be tuned, a priori, to achieve high quasar detection 
fractions (99% completeness with default cuts), given an acceptable rate of false 
alarms. We establish the typical rate of false alarms due to known variable stars 
as ^ 3% (high purity). Applying the classification, we increase the sample of 
potential quasars relative to those known in Stripe 82 by as much as 29%, and 
by nearly a factor of two in the redshift range 2.5 < z < 3, where selection by 
i color is extremely inefficient. This represents 1875 new quasars in a 290 deg 2 

field. The observed rates of both quasars and stars agree well with the model 
predictions, with > 99% of quasars exhibiting the expected variability profile. 
We discuss the utility of the method at high redshift and in the regime of noisy 
and sparse data. Our time-series selection complements well independent selec- 
tion based on quasar colors and has strong potential for identifying high-redshift 
quasars for BAO and other cosmology studies in the LSST era. 



Subject headings: quasars: general — stars: variables: other — methods: statis- 
tical — cosmology: miscellaneous 
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Introduction 



Active Galactic Nucleii (AGN) — and quasars (QSOs) in particular — continue to play 
a central role in modern astrophysics. AGN emission is the hallmark of supermassive black 
hole growth. Powerful AGN outflows of photons and matter affect the Universe on small 
(accretion disk) and large (galactic, extragalactic) size scales, and there is now substantial 
evidence both from observations and simulations (both semi-analytical and hydrodynamical) 
that a "quasar mode" is a key part of massive galaxy formation and evolution. On an ensem- 
ble basis, the observed number densities (e.g., Croom et al.|200 9), large scale-structure (e.g., 



Shen et al. 2007, 2009 Ross et al. 2009), and use as a cosmological probes via the Lyman-a 



forest and Baryon Acoustic Oscillations (BAO; e.g., McDonald & Eisenstein 2007), demon- 
strate that luminous AGN — despite their rare nature — provide elucidating constraints on 
some of the grandest questions of our time. 

To fully exploit the potential of active galaxies, astronomers must identify and charac- 
terize the physics of large, representative samples. Recent surveys have identified quasars 



using spectroscopy (e.g., Schneider et al. 2010), color-selection (e.g., Richards et al. 2009), 



X-ray detections (e.g., Bauer et al. 2004), and optical variability (e.g., Schmidt et al. 2010), 
among other methods. The quasar catalogs generated by these surveys exhibit a range of 
different completeness and efficiency characteristics. This is partly due to differences in tele- 
scope sensitivities for the different surveys, but also to intrinsic differences in the physics 
that each survey samples. Deep X-ray surveys and optical variability surveys currently show 



the most promise for identifying large numbers of AGN per square degree of sky (Brandt & 



Hasinger 2005). 



Optical surveys most often utilize quasar photometric colors to separate quasars from 
field stars. Typical "completeness" fractions — the fraction of retained spectroscopically- 
confirmed quasars — for the selection of quasars based on color are ^ 70% (e.g., Richards 
et al. 2001), although this can decrease dramatically, to 10 — 50%, in the redshift range 



(2.5 < z < 3) where the quasar UV color excess (u-g color) is similar to that of stars (Richards 



et al. 2006 ) or for highly extinguished quasars. The "purity" of selection, or the fraction of 



false alarms due to spectroscopically-confirmed stars, can be similarly low. Optical variability 
provides an independent selection criteria and is becoming an increasingly important survey 
technique. Quasar selection based on optical variability is a key component of current and 
upcoming missions, such as the Panoramic Survey Telescope & Rapid Response System 
(PanSTARRS; Kaiser et al.||2002 ) and the Large Synoptic Survey Telescope (LSST; Ivezic et 



al. 2008; Abell et al. 2009). Indeed, with the recent endorsement by the Astro2010 Decadal 



Survey of both ground and space-based wide-field synoptic surveys, exploring discovery in 
the time domain is particularly timely. 
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Quasar fluxes observed in optical passbands (e.g., Matthews &: Sandage||1963~ ) meander 
in time, non-periodically, with flux differences that tend to be larger on larger timescales (e.g., 



Hook et al. 1994). This has historically been characterized in studies of auto-correlation using 



the so-called "structure function" (e.g., Simonetti, Cordes & Heeschen 1985; Hughes, Aller 



& Aller 1992), evaluating the average square variations versus timescale for an ensemble 



of quasars. It is generally assumed that the combination of observations from individual 
sources, each perhaps observed only a pair or a few times, will accurately describe the 
intrinsic variability of a given quasar. Data from the Sloan Digital Sky Survey (SDSS; e.g., 
Abazajian et al.||2009 ), and high-time-cadence observations for Stripe 82Qin particular, have 
allowed major advances in quantifying and understanding the nature of this variability (e.g., 
Vanden Berk 2004 Ivezic et al. 2004), particularly for individual objects (e.g., MacLeod et 



aL]|2010 Sesar et aL]|2010 ) . We explore in detail below the connection between the ensemble 
variability and the variability of individual, well-sampled quasars. 

The correlated variability of quasars is unique compared to most other variable objects 
(e.g., stars) which tend on long-timescales (e.g., long relative to a periodicity timescale) to 
exhibit non-correlated variability. Quasars tend to vary much less on monthly and shorter 
timescales as compared to yearly timescales, unlike most stars (excluding, e.g., long-period 
variables). This distinction motivates the possibility of using quasar time series modelling to 
classify quasar and differentiate them from other objects. Such a classification, particularly if 
it can be done efficiently, with few data, could have tremendous benefit for selecting quasars 



for spectroscopic followup and use in cosmological studies (e.g., of BAO; McDonald & Eisen- 



stem 



2007[). Progress towards these ends is now becoming possible thanks to the generative 

a model capable of stochastically pro- 
uncovered by 



"damped random walk" quasar light curve model 

ducing a quasar like light curve, in this case with only 2 input parameters 



Kelly et al. (2009). Kozlowski & Kochanek (2010) have shown that this model accurately 



describes the light curves of 100 well-sampled quasars, and it can be used to separate these 



from stars. MacLeod et al. (2010) have shown that the model accurately describes individ- 



ual quasars in Stripe 82. We show that it can be used to accurately describe the ensemble 
variability as well as the individual variability. 

Below, we discuss a novel method to fit the structure function for Stripe 82 using the 
damped random walk model (Section[2]). We then show (Section[3]) how the fit of this average 
quasar model can be rigorously evaluated for individual light curves to separate quasars from 
stars with no parameter fitting. We show that nearly all known quasars (> 99%; Section|4]) in 



lr The equatorial Stripe 82 region (20h 24m < R.A. < 04h 08m, -1.27 deg < Dec < +1.27 deg, ~ 290 
dcg 2 ) was repeatedly observed — 58 imaging runs from 1998 September to 2004 December — with 1-2 
observations per week, each Fall. 
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Stripe 82 show the telltale variability signature, with a very high completeness (~ 99%) that 
can be estimated a-priori. This offers a substantial improvement over ad-hoc methods (e.g., 



90% completeness in Schmidt et al. 2010) for applying structure function fits to individual 
light curves. The fraction of stars which could be confused as quasars is, likewise, very small 
(< 3%) and is in reasonable agreement with a-priori estimates. 

The method we outline below is based on maximum likelihood principles; it is therefore 
theoretically optimal and applicable to any field or survey. The quasar probabilities returned 
by the method robustly take into account model uncertainties and uncertainties (both sta- 
tistical and systematic) in the data. The method can be applied in the limit of very few data 
points (2 or more), because there are no free parameters to fit and can provide key additional 
leverage to aid color-based selection schemes for future surveys. The methodology is easily 
adaptable to include observations in multiple photometric filters and to avoid contamination 
from spurious data. 



2. Data Selection and Ensemble Quasar Variability 



The majority of work presented herein makes use of the ugriz photometry from the 



Sesar et al. (2007) variable source catalog. That catalog contains 67,507 g < 20.5 objects in 



Sloan Stripe 82. These are selected based on the presence of statistically significant temporal 
variability (> 0.05 mag RMS and xt/ u > 3 in g and r). Additional details regarding the 



survey selection criteria can be found in Sesar et al. (2007). The variability selection retains 



^ 90% of known (spectroscopic) quasars in the field. Those spectroscopically identified 
quasars were almost all entirely targeted as a result of color selection from the main SDSS 
survey (and not because of time-domain characteristics). 

Our goal is to differentiate quasars from other field sources using variability metrics 
alone, without regard to color or cross- correlation with surveys at other wavebands. We 
begin by evaluating the ensemble variability of 6304 spectroscopically-confirmed quasars in 



Sesar et al. (2007) to quantify the observed range of magnitude change, Amag, as a function 
of timescale Tij = U 



tj between measurements at times ti and tj. 



Figure [T] shows a histogram of all magnitude differences in g-band for measurements 
separated on timescales r (in days) between 1.5 < log (t) < 1.8. The histogram is apparently 
symmetric, peaked at zero, and has broad, exponential wings. The exponential distribution 
here is commonly found (e.g., Ivezic et al. 2004). We find that the histogram can be well 



modelled as a sum of zero-mean Gaussians whose widths scale logarithmically with the quasar 
magnitude. Residual non-Gaussianity (i.e., an excess of large fluctuations as compared to 



-5 - 



the predictions from a Gaussian) is due to a small fraction (2%) of sources with excess 
variability (see also, Section 5.2[ ). Figure [2] plots the Gaussian width — after removing the 
measurement uncertainty — as a function of magnitude and r. This is commonly known as 
the first order quasar structure function SF T 



CAmag(r) (e.g. 



Simonetti, Cordes & Heeschen 



1985 Hughes, Aller & Aller 1992). 



To move from the ensemble variability to the variability of a given object, it is necessary 
to treat the covariances between neighboring points — which are estimated from the same 
data — in the structure function. Given the approximate Gaussianity of the ensemble, it 
is natural to fit a Gaussian random process to individual objects. Consistent with previous 



works (Kelly et al. 2009 Kozlowski & Kochanek 2010 MacLeod et al. 2010), we find that 



the quasar variability as a function of time difference is well-modelled using a covariance 
matrix of the form: 



Cij = a^Sij + ^a 2 r exp (-Xy/r ), 



X 



where cij is the measurement uncertainty for the i'th observation, r D is an exponential damp- 
ing timescale (units of days), a 2 is the intrinsic variance between observations on short 
timescales « 1 day, and 8ij is the Kronecker delta function (1 for % = j, otherwise). 
This model predicts SF T oc ctt^ 2 \Y — exp (— t^/to))] 1 / 2 , which rises oc t\- 2 on short timescales 
{jij <C r ). This model is plotted over the data in Figure [2j To obtain an acceptable fit, 
we allow a 2 and r D to vary logarithmically with the median quasar magnitude. The best fit 
scalings for the ugriz bands are reported in Table 1. Python software to calculate the fits 
and the quality statistics discussed below can be downloaded from the project webpaga^] 



2 http: / / astro.berkeley.edu/~nat / qso_selection 



Table 1: Structure Function Parametrization 



Filter 


a i 


a 2 


a 3 


a,\ 


u 


-3.90 


0.12 


2.73 


-0.02 


9 


-4.10 


0.14 


2.92 


-0.07 


r 


-4.34 


0.20 


3.12 


-0.15 


i 


-4.23 


0.05 


2.83 


0.07 


z 


-4.44 


0.13 


3.06 


-0.07 



Notes: SF T = <tt 1//2 [1 - exp (-t/to))] 1 / 2 , with log(<r 2 ) = a\ + a 2 (mag - 19), log(r D ) = a 3 + 
04(mag — 19). Statistcal uncertainties in the above parameters are of order 10 -4 and are negligble. 
Magnitudes are in the AB system, uncorrected for Galactic extinction. 
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i r 




A mag 

Fig. 1. — The histogram of magnitude differences in g-band observed for 6304 quasars on timescales 
10 15 < r < 10 1 - 8 days. This is a time range which is particularly well-sampled for Stripe 82. The 
distribution has broader wings than a Gaussian (red curve); however it is well- represented by the 
superposition of Gaussian's (blue curve) with widths that increase logarithmically (Table 1) with 
the quasar magnitude. Residual non-Gaussianity can be attributed to a fraction 2% of quasars 
which exhibit excess variability. 
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Fig. 2. — The ensemble structure function SF T for quasars in SDSS Stripe 82 in g-band. Each 
bin corresponds to a set of order 10 4 difference measurements. The maximum likelihood Gaussian 
variance in addition to the measurement uncertainty is determined for each pair, and the median 
and standard deviation of these values for each set is plotted. The data points are well-fit (dotted 
curves) by the damped random walk model (see text) with parameters that vary with quasar 
magnitude only to reproduce the observed increase in variability for faint sources (Table 1). There 
is modest evidence (shortest timescale points) for untreated systematic measurement uncertainty 
at the ^ 1% level. 
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log(^) 




Fig. 3. — (A) Exponential damping timescale r D for the random walk SF T model versus the short- 
timescale variance a 2 . Spectroscopically-confirmed quasars appear in the top left of the plot, while 
spectroscopically identified stars appear in the bottom right. High redshift (z > 3) quasars tend 
to have weak, short-timescale variability (see text). (B) The short-timescale variability scales with 
intrinsic brightness log (<r 2 ) = 0.17G — 0.1, with large scatter. High redshift objects appear toward 
the left due to the survey flux limit g < 20.5). Low redshift objects appear toward the right. 
Typical uncertainties are ±1 dex in r D and a 2 (90% confidence). 
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We can also fit the SF T model directly by maximizing the posterior probability of the 
model given the data (see, also, Kozlowski & Kochanek 2010). The best-fit a 2 and r D values 
for each quasar are plotted in Figure [3]A. for g band. Note that consistent scalings with the 
g band magnitude (Table 1) are found. 

The origin of the scaling of variability with source brightness appears to stem from 



the well-known anti-correlation between intrinsic brightness and variability (e.g., Ivezic et 
al. 2004). Figure [3b displays this trend for the absolute magnitude in g-band, M„. We 



note that a 2 no longer correlates with g-band magnitude or redshift after subtracting away 
this trend. High-redshift quasars tend to appear toward the left of the plot (Figure [3^>) - 
weak short-timescale variability — due to the g = 20.5 flux limit. The effects of a declining 
luminosity function (and potentially source evolution) also work to keep low-redshift objects 
toward the right of the plot. The normalization of the fit in Figure [3)3 can be seen to vary 
slightly with redshift also as a result of this flux limit, and we can expect the apparent 
magnitude scalings (Table 1) to also have some dependence on redshift and the survey 
selection (see, Section 5.3). These shifts are small compared to the (apparently intrinsic) ~1 
dex scatter in the scalings (e.g., Figure |3 



3. Quasar Variability Selection Formalism 



We can regard the parameterization of the SF T using the damped random walk model 
as a rigorous mathematical approach to evaluating the quasar likelihood for a given light 
curve, which takes into account all correlations in the data. Employing the mathematical 



prescription adapted from Rybicki & Press (1994) by Kozlowski & Kochanek (2010), we 
write for the probability of the data x given the quasar variance model C = C(& 2 , r Q ): 

P(x\a 2 ,T ) oc |C|" 1/2 exp[-0.5(x-x o ) T C" 1 (x-x o )]. (2) 

We can marginalize analytically over x Q and replace the exponent in Equation [2] with 



-0.5* 



QSO 



0.5(x — £ ,best) C [x — X ,best) 5 



(3) 



where x ,best = Yli j ^ij lx j/ J2i j Cm '■ The inverse of C is tridiagonal (Rybicki & Press 1994) 



which allows for rapid O(N) computation. Given the parameterization above (Table 1) for 
r and a 2 in terms of apparent magnitude mag, we can then directly evaluate Xoso( ma §) 
for all objects of interest with no additional fitting (i.e., no fitting beyond the fitting of x Q ). 

For a quasar with the mean ensemble variability, Xqso should be \ 2 V distributed with v 
degrees of freedom, where v equals the number of data points minus one. The most likely 
value is Xqso = v - The expected distribution of Xqso f° r a temporally-uncorrelated source 
can be evaluated by Monte Carlo or estimated quickly as we now discuss. 
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3.1. Significance Estimates 

The expected value for Xqso f° r a source that is not a quasar but varies in a time- 
independent fashion with Gaussian scatter o m is 

E[x 2 QSO ] « E[x 2 \Tr{C^) = o* TrCC" 1 ), (4) 

where is the expectation value of x and Tr() is the matrix trace operation. In Equation 
[4j we assume that all observations have approximately the same uncertainty, and we ignore 
the light curve mean. Because off-diagonal terms in C~ x do not contribute, Xqso * n Equation 
[4] is effectively a sum of the squares of N Gaussian random variables — each with zero mean 
and standard deviation cr m , which includes the measurement uncertainty — multiplied by a 
constant, Tr{C~ l )/N . Therefore, the quantity iV _ XQ SO /[cr^ l Tr(C _1 )] will be xl distributed, 
where the number of degrees of freedom v = N—l. (The missing degree of freedom represents 
the light curve mean which we have neglected to write down). 

The true value of must be determined from the data. In the case again of equal 
uncertainty on all data points, 

P{a 2 m \x) oc a m N exp(-0.5Nv x /a 2 J, (5) 

where v x = (x 2 ) — (x) 2 . Multiplying the xl probability density for AfxQ SO /[er^Tr(C'~ 1 )] 
by P^a^x) and integrating over <x m , we find a Beta distribution for the Null-hypothesis 
distribution of Xqso gi ven the data: 

P (XQso\ x > not quasar) oc (y[l -y]) {v ~ 1)/2 } (6) 

with y = Xqso/[Xqso + v xTr{C~ 1 )]. This reference distribution approximates well the 
observed x%$o frequencies for stars (Figure [6j) and can be used as the reference distribution 
to calculate the significance of a given Xqso value. It is useful also to define 

2 _ VsTrjC- 1 ) 

XFalsc/^ = "2 ( 7 ) 

AQSO 

The quantity xlaise w ^ n be small and of order v for a time- independent, non-quasar variable 
source (i.e., a potential false alarm). 



3.2. Confidence Estimates 



As we discuss above, the Xqso calculated for a hypothetical mean quasar — varying 
according to Equation [l] with the best-fit parameters in Table 1 defining a 2 and r D — will 
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be of order v — N — 1 and Xqso wm follow a xl distribution. However, the variability of 
a given quasar may depart from the mean sample variability. We can allow for this on a 
sour ce-by- source basis by replacing C — )■ C / s 2 in Equation [2} The scale factor s is a fudge 
factor that can be marginalized over to allow the mean quasar model to acceptably fit each 
quasar light curve. 

The distribution of s given the data for a given quasar is given by Equation [5j replacing 
a m with s and Nv x with the most likely value for Xqso of v. In an argument parallel to 
that above used to derive significance, the xl probability density describing Xqso/ s2 can °e 
convolved with P(s 2 \x) to find a reference Beta distribution to evaluate quasar confidence: 

P(x^sok, quasar) ex (y'[l - y'])^' 2 , (8) 

with y' = v/[v + Xqso]- Likewise for P(xq SO |:e, not quasar) for stars, this reference distri- 
bution approximates well the observed Xqso frequencies for quasars (Figure |6j). 

Note that because the reference distribution for evaluating significance has the same form 
as that used to evaluate confidence, a curve of constant ^Qso/^Faise represents a curve of 
equal odds in favor of the hypothesis quasar versus not-quasar, given equal prior information. 



4. Quasar Variability Selection Application 



In addition to 6573 quasars from Schneider et al. (2010), 3020 labelled stars (i.e., 
spectroscopically-confirmed) in the DR7 are plotted in Figure [3]A We note that the to- 
tal number of objects now known to be quasars and used in these plots (and those below) 



is larger than the quasar sample of 6307 from Sesar et al. (2007) used above to establish 



SF T . Known stars tend to have r D <^ 100 days (i.e., approximately temporally-uncorrelated 
variability) and also increased variability on short timescales relative to quasars (see, also, 
MacLeod et al. 2010). In principle, Figure [3~K can be used for classification of unlabeled 



sources as well (e.g., Kozlowski & Kochanek 2010); however, we and others find the uncer- 



tainties in these parameters to be large (often a factor of ten even for well-observed sources). 
Also, there is significant overlap between the populations of labelled sources. A more optimal 
separation could be developed by seeking to fit no free parameters. 



Figure [4] shows Xqso ano ^ XFaise f° r a ^ 17,588 sources in 



Sesar et al. 



(2007) with 50 or 



more observations in g-band. Roughly 70% of all sources in Stripe 82 — excluding 290 < 
R.A.< 340 degrees where the Galactic stellar contribution dominates the source counts and 
blending between sources becomes common — have 50 or more observations. There is a 
clear separation between the majority of stars and quasars, and the values of Xqso anc ^ xlaise 
take on their expected values for labelled sources. For the population cut shown in Figure 
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Fig. 4. — Non-quasar variability metric xlaisc versus quasar-like variability metric Xqso f° r 5-band 
observations of sources with > 50 epochs in Sesar et al. ( 2007 ) . Spectroscopically confirmed quasars 
(from Schneider et al. 2010) and stars are marked in red and blue, respectively. The low (z < 0.5) 
and high (z > 3) redshift quasars are additionally marked in green and orange, respectively, to 
display the trend toward low variability at high-z. The solid line (highlighted in green) represents 
a cut based on equal probability (Xqso = ^Faise) between the hypotheses. There is also a cut on 
Xpaise > 2.5, corresponding to a ^ 3.5<r significance cut (Section 3.1) against false positives. RR 
Lyrae stars (from; Sesar et al.|2010 ), which are highly variable, are also plotted as yellow diamonds. 
The percentages of objects on either side of the cut are recorded in the figure (descending in the 
same order as in the legend). 



13 



|ij the completeness fraction for the fraction of known quasars retained is 99%. One minus 
the fraction of known stars retained, the purity, is 97%. 

Of all sources, we predict that 40% are quasars, potentially increasing the overall known 
quasar sample size by 20%. The candidate quasars have a distribution in g-band magnitude 
consistent with that of the known quasars. We discuss the newly discovered quasars in more 
detail below in Section [5] 

We also show explicitly in the plots the location of low redshift (z < 0.5) and high 
redshift (z > 3) quasars, 97.8% and 94.3% of which, respectively, survive the nominal quasar 
selection. These tend to have systematically high and low Xqso va hies, respectively, due to 
the trend toward low variability at high luminosity discussed above. 

The selection can be optimized to pursue z > 3 quasars by placing a more strin- 
gent cut on Xqso^ < ^ (l° w quasar-like variability) to reduce the number of selected low- 
redshift quasars and also relaxing the cut on overall variability to allow in more false alarms, 
Xpaiae/ 1 ' > 2- The resulting completeness (purity) for selecting z > 3 quasars is 97.6% 
(95.7%), considering only contamination by stars (95% of z < 3 quasars are retained, but 
most could be rejected using color information; Figure [8]). Two of the three high-redshift 
quasars that do not survive the refined Xqso cu ^ can ^ e retained using the outlier rejection 
scheme outlined below. (These quasars both have a single deviant photometric point.) 

The classification defined by the cut in Figure |2] essentially assumes equal prior prob- 
ability that a source is or is not a quasar (Section [5]). We can see (Figure [5]) that the 
classification remains quite stable over a broad range of source densities which include the 
high stellar-density region of 290 < R.A.< 340 (degrees) and also now include sources with 



few measurement epochs (the minimum in the Sesar et al. (2007) sample is 9). Because we 
are now classifying sources with few observation epochs, we select based on the significance 
level of a given XiLise rather than on simply x| alse . The completeness (purity) we derive for 
Figure [5] is 99.1% (97.1%), relatively little changed. However, we caution that this neglects 
the fact that few sources are spectroscopically identified at high source densities p ^ 10 3 
deg -2 (R.A. w 300 degrees). It is clear that the dramatic rise in the overall source counts 
- which must be dominated by stars — would lead to marked completeness and purity 
decreases. It is, therefore, advisable in such high source density regions to apply a more 
strict cut on xlaise- in principle, prior information on the expected source density in a given 
direction for a given survey should be utilized to dial in an appropriate threshold value. 



-14- 




Fig. 5. — The non-quasar variability metric versus source density p for all sources (N > 9 epochs 
per source). Known quasars and stars that survive a > 3.5a significance cut on xlaise an< ^ a ^ so nave 
Xqso < /tlaise *° ^ e classified as quasars are plotted in red and blue, respectively. The classification 

- which assumes equal prior probability that a given source is or is not a quasar (Section [4| - 
is quite stable over a broad range in p. At very high p ;> 10 3 deg -2 there are few spectroscopic 
identifications and the stellar population begins to envelope the quasar population, demanding a 
higher level significance cut. 




Fig. 6. — Projections of the axes in Figurejijfor Xqso/ 17 (t°p) an d xlaise/^ (bottom). The dotted 
curves show the predicted Beta distributions for the source counts (see text), including an excess 
of variable objects in both plots at the 10% level. 
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4.1. Significance/Confidence Validation: Tail Populations 



Figure [6] shows the observed distributions of Xqso anc ^ xlaise- Over-plotted are the 
expected Beta distributions (Section [4]) for the median number of degrees of freedom v = 55. 
These describe the observed frequencies well, apart from a tail of 10% of variable stars 
which have xlaise/^ ~ 2 on average and a tail of 10% of highly- variable quasars which have 
Xqso/ u ~ 2 on average. These tail contributions can be used to obtain a precise significance 
estimate and rates which agree well with predictions, although this improvement has little 
impact on the logarithmic quasar/star separation in Figure |lj We note that 50% of the 
Xqso/ u > ^ objects are at low redshift (z < 1), where our approximation that variability 
scales as apparent (instead of absolute) magnitude is expected to break down. 

Taking into account the excess number of events in the tails of the observed Xqso an< ^ 
xlaise distributions (Figures [6]), a cut on either parameter at a value of 2.5 (corresponding 
to ~ 3.5a without the tail estimates) would need to be increased to ~ 4.1 to account for 
the tails at the same significance level. This corresponds to a m 5a cut on the initial 
distributions, ignoring the tails. Therefore, a conservative prescription to either reject false 
positives or to reject quasars, respectively, is to cut on x| alse or Xqso> respectively, at the 
5a significance level. At this significance level, there are few outliers: 26 stars (~ 1% of 
the sample) masquerading with Xqso < XFaise as quasars and 15 (« 0.3% of the sample) of 
overly- variable quasars, after excluding quasars at z < 1. We discuss the nature of these 



strong outliers in Section 5.2 below. 



5. Discussion 

The light curve fitting above has the potential to provide vital information for quasar 
selection, particularly for those objects which are challenging to select based on their photo- 



metric colors. Figure [7] shows the location of all Sesar et al. (2007) objects in the u — g,g — r 
color plane. Low redshift quasars tend to lie in a tight locus to the left of the plot. Stars 
run along a branch upward and to the left. High redshift (and also potentially highly ex- 
tinguished) quasars fan out to the right of the quasar locus, through the stellar region. 



Color-based selection is clearly challenging for these objects (see also, Richards et al. 2006). 
We show in Figure [8] that our method is capable of identifying a substantial number — 1875 
in this case — of highly statistically significant quasar candidates. Very few of these (~ 1%) 
lie in the color-color space typically dominated by stars (Region V in Figure [7]), and we 
suspect some of these 26 candidates are extinguished quasars. 

The selection for Figure [8] synthesizes recommendations from above. For the high- 
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Galactic latitude portion of Stripe 82 (excluding 290 < R.A.< 340 degrees), we apply a 5cr 
cut on xlaise ^° eliminate the tail of variable stars (Section 5.2 Figure |6j bottom). A higher 



significance cut (7a) is utilized for the low-Galactic latitude region to account for the higher 
stellar density (Figure [5]). Finally, we ignore sources brighter than i = 18 (corresponding 
to ~3% of the known quasar sample but likely few of the candidate quasar sample). This 
cut also helps to eliminate bright, red sources (i.e., primarily late-type stars at g — r ^ 1.3, 
u — g ^ 2 in Figure [7]), which are almost certainly not quasars. 

Quasars redshifts correlate strongly with their colors; three regions (A, B, and C) which 
contain ^ 90% of quasars in the redshift ranges z ^ 2.5, 2.5 <^ z <^ 3, and -2^3, respectively, 
are plotted in Figure [8] These regions are defined so as to maintain the total number of 
quasars per redshift bin associated with a given color-color bin. Table 2 displays the number 
of known, spectroscopic quasars in each color bin as well as the number of new candidate 
quasars discovered here. In the last two columns of the table, we make this comparison 
separately for the bright (i < 19 mag) and faint (i > 19 mag) samples. In parentheses in 
these columns, we quote the implied fractional increase in known quasars. 

We potentially increase the overall known quasar sample by 29% (1875/6573), with a 
substantial contribution (+89%) in the color-color region (B) where color selection is most 
difficult. The number of known z ^ 3 quasars would increase by 36-46%, depending upon 
whether we include 15 candidate quasars located in the stellar region V in Figure [7} These 
gains are primarily due the selection here of the faint (i > 19 mag) quasars, although 
the fractional increases for z ^ 3 quasars are relatively independent of source brightness. 
It is important to note when making these comparisons that these gains are relative to 
spectroscopically-confirmed quasars, whereas a large number of our candidates (1280) are 



also (un-observed) candidates based on color (Richards et al. 2006). The fraction of new 



Table 2: Redshift Counts from Candidate Quasars Colors 



Color Region 


# Known 


New (% Known) 


New i < 19 


New i > 19 


A (z £ 2.5) 
B (2.5 £ z £ 3) 
C (z £ 3) 


6086 
325 
140 


1498, 297 (25%, 5%) 
288, 230 (89%, 71%) 
65, 48 (46%, 33%) 


93, 44 (6%, 3%) 
43, 41 (58%, 55%) 
13, 11 (43%,37%) 


1405, 253 (30%, 6%) 
245, 189 (98%, 75%) 
52, 57(47%, 34%) 



Notes: The color regions A, B, and C are defined in the text and in Figure [7| Columns 3-5 report 
the number of new quasars relative to spectroscopically-confirmed quasars, followed after a comma 



by the number (italicized) of new quasars not already color-selected in Richards et al. (2009) or 
spectroscopically-confirmed. 
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quasars relative to color-selected quasars are quoted as the second percentage in the final 
two columns of Table 2. These numbers suggest that color-based selection is complete at the 
^ 95% level below z m 2.5 and rather incomplete at higher redshift. 

As most of the new quasar candidates come from the faint end of the detected source 
population, where the color selections are more incomplete (owing in part to the need for 
high signal-to- noise imaging in multiple filters), it is tempting to restrict to the brighter 
sources in making a direct comparison of time-domain and color selection completeness and 
purity; Table 2 provides this. However, viewing the entirety of Stripe 82 as a fixed volume of 
data, the fact that high-confidence quasars can be obtained fainter than the color-selection 
limit might be considered legitimate advantage of our technique. 

Table 3 gives the number of known and candidate sources corresponding to the 6 color- 
color regions discussed in Sesar et al. ( 2007[ ). These regions define the typical color-color 



locations of various astrophysical transients (see, Figure [7]). The relative frequencies of 
the candidate quasars falling within a given region are roughly consistent with those of the 
known quasars. We note that there is a marked, relative increase in the number of candidates 
possibly associated with stellar locus stars (region V) and RR Lyrae stars (region IV). RR 
Lyrae stars tend to be strongly rejected as quasar candidates (Figure [1]). In future work we 
will further explore what fraction of candidate quasars in the stellar locus are stars or highly 
extinguished quasars. 



5.1. Weakly Variable or Non- Variable Quasars? 



Sesar et al. (2007) determine that 93% of spectroscopically-confirmed quasars brighter 
than g,r = 19.5 in Stripe 82 are variable at the > 0.03 mag level. They report a conservative 
fraction (> 90%), which is limited by the measurement uncertainty. Fitting a detailed model 
for the expected variability, we can make a more general statement regarding the fraction of 
quasars that vary. As reported above, > 99% of quasars with 50 or more data points yield 
a highly significant xlaise- The quasar population dwindles strongly below xlaisc/^ = 2-5 (a 
3.5cr cut on variability), unlike the stellar population. The majority of stars have xiLise ~ v 
which can occur either because the variability is not qso-like or because the variability simply 
is not statistically significant beyond the measurement error. 

The fact that nearly all quasars brighter than g = 20.5 have xlaise/^ > 2-5 (99.1% 
of quasars including those with as few as 9 measurement epochs; Section [4]) indicates an 
intrinsic variability at least 60% larger than the typical measurement error of 0.02 mag. 
Quasar with weaker variability can be measured, and they are missing from Figure |4j The 
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Fig. 7. — Color-color plot showing the location of spectroscopically-confirmed stars (blue triangles) 
and quasars (red circles). All of the sources from Sesar et al. (2007) are plotted as small black dots. 
We demarcate with dashed lines the regions which Sesar et al. (2007) find to contain, primarily, 
white dwarves (I), low-redshift quasars (II), dM/WD pairs (III), RR Lyrae stars (IV), stellar locus 
stars (V), and high-redshift quasars (VI). 
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Fig. 8. — Color-color plot showing the location of all sources from Sesar et al. (2007) in black (small 
dots). We plot in green (circles) the location of 1875 new quasar candidates with significances > 5a, 
> 7a for 290 < R.A.< 340 (degrees). Labelled are the color-color regions A,B, and C, corresponding 
to quasars lying approximately in the redshift ranges z < 2.5, 2.5 < z < 3, and z > 3, respectively. 
Also plotted are the dashed- line demarcations from Figure [7j 
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variability cut in Sesar et al. (2007) is agnostic as to whether a source is truly a quasar; 



therefore the presence of weakly variable stars where no weakly variable quasars are found 
(in Figure [4]) indicates that approximately all quasars are intrinsically variable. 



Using the spectroscopic sample from Schneider et al. (2010), in addition to the 6537 



quasars considered above which have light curves as part of the Sesar et al. (2007) project 



there are 505 which are not contained in the Sesar et al. (2007) sample. We downloadecQthe 
light curves for these quasars. The extra fraction of potential quasars reflects uncertainty in 
the overall sample size (due, e.g., to cuts on deblending flags, etc.). We find that 35 of 285 
sources with more than 8 measurement epochs yield XFaise/^ < 2-5. Only 9 of these have 
sufficiently weak variability to not be rejected as stable by a classical xl test at the > 5a 
level. This confirms that the fraction of non-variable quasars is very small (^ 1%) and not 



impacted strongly by the Sesar et al. (2007) variability selection. 



5.2. Outliers 



We have visually inspected the light curves and spectra for the strong outliers (26 
stars and 15 quasars) identified above. Two-thirds of the quasar outliers appear to be the 
result of a mild amount of errant photometry: we note the Xqso ~ v when calculated in r 
band for these objects and Xqso ~ v also i n £7-band provided we remove 1-3 outlying (by 
> 5cr) flux measurements. This outlier identification for individual points in a quasar light 
curve is determined using the prediction for data point Xi given all the other data points 

or our software 2 ] 



l^see, e.g. 



Kozlowski & Kochanck 2010 



Of the remaining five sources, 
two have poor photometry apparently as the result of deblending from a nearby bright 
source and two have only marginally large Xqso after accounting for outliers. We retain 
one objects (SDSS J001130. 40+005751. 7), a ROSAT source which appears to truly exhibit 
excessive, particularly short-timescale, variability {Xqso/ u > 30, t d « 10 days, in this case) 
as compared to the remaining sample of ~ 6000. 



Vanden Berk (2004) find evidence for a significant increase in short-timescale variability 
for ROSAT-associated quasars. Consistently, we find a mild increase in Xqso on average for 



94 ROSAT-associated quasars in pchneider et al. (2007). Of these, 80% have Xqso > v i an d 
the median is Xqso/^ = 1-5- All but a few of these are at z < 1, which suggests the poor 
quasar fit quality is due to the standard luminosity dependent variability (Figure [3b) and 



not an anomalously high intrinsic variability. MacLeod et al. (2010) have also looked at this 



3 Using an SQL query on the Stripe 82 database hosted by SDSS at 
http://cas.sdss.org/stripe82/en/tools/search/x_sql.asp 
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issue and find no statistically significant difference in variability for Stripe 82 quasars in the 
context of the damped random walk model. 

An effective scheme for rejecting outliers from an individual quasar light curve, which 
retains the separation between quasars and stars in Figure |4j can be implemented as follows. 
First and foremost, apply no outlier rejection in calculating xlaise- Next, choose a maximum 
allowed number of outliers (say 3) and a significance level for the residual (say 5er). Calculate 
Xqso an d the model prediction for the light curve to evaluate outliers iteratively, each time 
removing only the strongest outlier. Once this is complete, evaluate the standard deviation 
of the residuals and repeat the process with a significance cutoff that is the larger of 5 or 
5 times the standard deviation. This two-step process makes it so that non-quasars receive 
little outlier rejection. 

We note that the calculation of Xqso with this outlier rejection scheme substantially 
dampens by a factor f» 2 the tail of 10% of mild outlying quasars, leaving primarily only 
the low- z quasars/AGN. The peak in Xqso a ^ so shifts down by about 10% for all quasars. 
We note that we do not utilize this outlier rejection in the paper because the Xqso tails are 
adequately small in our view. Some form of outlier rejection may be quite important when 
using less pristine photometry. 

The outlier rejection method summarized above is tailored for quasars and has no ef- 
fect on the tail of 26 strongly outlying stars in xlaise or the 10% tail of modestly outlying 
stars. Of these objects, only 4 are late-type stars (which could, in principle be rejected 
based on color as discussed above). Seven of the objects have quasar-like colors (Figure 
|7j), have targeting flags in SDSS associated with quasars, and have low-confidence (< 95% 
at best) redshift determinations. We regard these as potentially mis-labeled stars. Fifteen 
high-confidence stars remain, and nine (about 0.3% of the full sample) exhibit quasar-like 
(Xqso/^ < 2) variability. We note that one of these objects was targeted as a white dwarf 
(SDSS J234601. 89— 004255.5), and in a future study we will probe in more detail the nature 
of the other outliers and the potential that the underlying physics (e.g., the presence of an 
accretion disk) may be similar to that of quasars. 

The data from multiple filters can also be combined to obtain more robust values for 
Xqso an d Xpaise- The average of these for g and r bands leads to a small, but significant, 
improvement in the separation in Figure |4j (The addition of other bands helps little.) To 
eliminate outliers in the case of approximately simultaneous data, it may also effective to 
throw out observation epochs exhibiting strong color variations, for example > 5 times the 
standard deviation away from the median color g — r. We observe that the quasar colors are 
relatively stable in neighboring photometric bands as a function of time, although there is 
a mild dependence on redshift. There are ~ 30% color variations that occur — presumably 
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due the difference in continuum and line variability — for redshifts that place the A = 2800A 



Mgn line (also, Ivezic et al. 2004 ) in one of the passbands 



We have explored also simultaneously fitting Equation [2] to the data from 2 or more 
passbands to potentially improve statistics. One way to do this is to use the covariance 
matrix (Equation [TJ for the data from one reference passband (e.g., g-band) and then to 
allow the other bands to only vary linearly with the g-band magnitude. We find that the 
g and r band data can be fit-well in this fashion, which further indicates the approximate 
constancy of quasar g — r colors. However, the statistics (as measured by the scatter and 
source separation in Figure |4l do not improve. 



5.3. Redshift Estimation 

In Figures |3]B and [4] above, we show a tendency for increased variability with decreasing 
luminosity which maps to a trend of decreasing variability with redshift. The survey flux limit 
plays a role in this trend, and evolutionary effects are also likely at work. Our classification 
appears to perform best above z = 1 as a result. In principle, the classification can be further 
optimized to identify bursts at higher redshift (see, e.g., Figure B by performing a strict cut 
on Xqso ( see ' Section m. 

It is also possible to estimate the redshift of a quasar using either a 2 or Xqso- The 
scatter in 1 + z estimated this way relative to the true 1 + z is large, a factor of 2. There are 
also likely strong selection effects at play. The flux limit prevents high redshift bursts from 



exhibiting strong variability. That is, the Malmquist (1922) bias induces a time-domain 



bias. The survey color-based selection may also play a role. We caution the reader that 
selection effects that define a given survey may also strongly impact the utility of this redshift 
estimation. It is not clear at present that useful redshift constraints can be obtained from 
our variability measures. 



Conclusions 



We have explored a parameterization of the ensemble quasar variability structure func- 



tion using the damped random walk model (Kelly et al. 2009). This enables a statistically 



rigorous evaluation of the fit of an individual quasar to the expected sample average vari- 
ability profile. The latter step provides, essentially, a classification between objects undergo- 
ing quasar-like variability and objects exhibiting temporally uncorrelated variability. Unlike 
previous work, the classification requires no parameter fitting, is essentially free from survey- 
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specific peculiarities, and appears to be very robust in separating known variable stars from 
quasars. Nearly all (> 99%) known quasars show the expected variability profile and can be 
cleanly separated from stars, with ^3% contamination by rare variable stars. 

Our variability-selected quasar candidates span a range of redshifts, including a fac- 
tor of nearly two increase in rates for intermediate redshifts (2.5 < z < 3), where color- 
based selection performs poorly (e.g., Richards et al. |[2006 ) . The classification performs well 



also at high-redshift, and can be tuned to yield > 98% completeness for z > 3 quasars. 
We potentially increase the sample of quasars in Stripe 82 by 10-25%, depending upon 
whether we compare to color-selected (but not spectroscopically-confirmed) quasars or to 
spectroscopically-confirmed quasars only. Most of the new quasars are faint (i > 19 mag). 
Relatively independent of these considerations, we increase the quasar fraction by 33% or 
more for z ^ 2.5. Software to perform the classification as well as the list of candidate 
quasars can be downloaded from the project webpage 2 . 

This work has been conducted under the auspices of a broader project (The Time- 



domain Classification Project; Bloom et al. 2008), a goal of which is to characterize the 



allowed range of variability versus timescale for the full diversity of astrophysical objects. 
We have shown above how the temporal profile of quasars stands out against the signature 
of variable stars, predominantly on long timescales but with strong power to separate strong 
variables from quasars on short timescale. Future work will apply the methods outlined 
above to select highly extinguished and high-redshift quasars for spectroscopic followup and 
also to embed the methods discussed above withing the broader TCP framework of variable 
object classification. Of particular interest will be additional tests of schemes (outlined 
above) to reject photometric outliers and to apply the classification in the limit of very few 
data points for wide-field, and in particular real-time surveys, for example using data from 



the Palomar Transients Factory (Rau et al. 2009). 



The "one-versus-many" classification framework shown here should prove to be an im- 
portant pillar of any survey that relies on time-domain photometric observations for target 
selection (even if the targets of interest are not quasars). It appears that with the appro- 
priate selection of cadences, the time-domain identification of quasars is now more robust, 
complete and pure compared with color selections. This has important implications for sur- 
vey strategies of LSST, the Synoptic All-Sky Infra- Red survey (SASIR; |Bloom et al. 2009), 



and other time-domain projects (e.g., WFIRST): when faced with a choice between more 
colors and a wider field in a given filter, the latter option is preferred. With deep imaging 
in one blue band (U or g), coupled with synoptic imaging in a redder band, we believe that 
high-redshift quasars may be most effectively discovered. 
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Table 3: Source Region Counts from Candidate Quasars Colors 



Color Region 


jfc Known 


New (% Known) 


New i < 19 


New i > 19 


I (white dwarfs) 


5 


2, 1 (40%, 20%) 


0, 


2, 1 (no previous) 


II (low-z QSOs) 


5866 


1387, 239 (24%, 4%) 


79, 34 (6%, 2%) 


1308, 205 (29%, 5%) 


III (dm/ WD) 


134 


77, ^7(57%, 35%) 


25, 19 (50%, 38%) 


52, 28 (62%, 55%) 


IV (RR Lyrae) 


158 


98, 62 (62%, 39%) 


6, 5 (25%, 21%) 


92, 57 (69%, ^5%) 


V (stars) 


252 


257, 210 (102%, 83%) 


49, 48 (80%, 70%) 


208, Iftg (109%, 85%) 


VI (high-z QSOs) 


158 


54, 36 (34%, 23%) 


6, 4 (16%, 11%) 


48, 32 (40%, 27%) 



Notes: The color regions I, II, III, IV, V, and VI correspond to the typical location of objects in 
parentheses in the first column (see Figure [7]) . Columns 3-5 report the number of new quasars 
relative to spectroscopically-confirmed quasars, followed after a comma by the number (italicized) 



of new quasars not already color-selected in Richards et al. (2009) or spectroscopically-confirmed. 



